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Abstract: We consider the accumulation of beneficial and deleterious mu- 
tations in large asexual populations. The rate of adaptation is affected by 
the total mutation rate, proportion of beneficial mutations, and population 
size N. We show that regardless of mutation rates, as long as the proportion 
of beneficial mutations is strictly positive, the adaptation rate is at least 
C(log 1 ~' 5 N) where S can be any small positive number, if the population 
size is sufficiently large. This shows that if the genome is modeled as con- 
tinuous, there is no limit to natural selection, i.e. the rate of adaptation 
grows in N without bound. 
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1. Background and Introduction 

We consider the accumulation of mutations in large asexual populations. The 
mutations that biological organisms accumulate over time can be classified into 
three categories: beneficial, neutral, and deleterious. Beneficial mutations in- 
crease the fitness of the individual carrying the mutation, while deleterious mu- 
tations decrease fitness; neutral mutations have no effect on fitness. Adaptation 
is driven by accumulation of beneficial mutations, but it is limited by clonal 
interference (clones that carry different beneficial mutations compete with each 
other and interfere with the other's growth in the population). Fisher and Muller 
argued for the importance of this effect as early as the 1930s (Fisher i930, Muller 
1964). Here we are concerned with the rate of adaptation, that is the rate of 
increase of mean fitness in the population. 

The simplest scenario one can consider is one in which a single beneficial 
mutation arises in an otherwise neutral population and no further mutations 
occur until the fate of that mutant is known. This situation is well understood. 
The most basic question one can ask is what is Pfi x , the fixation probability 
of the mutation. This was settled by Haldane (1927), who showed that under 
a discrete generation haploid model, if the selection coefficient associated with 
the mutation is s, then under these circumstances Pfi x ~ 2s. In this case Pfi x 
is almost independent of the population size, N . 
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When the mutation does fix, the process whereby it increases in frequency 
from 1/N to 1 is known as a selective sweep. The duration of a selective sweep 
is 0(\og(sN)/s) generations. If one assumes that the mutation rate per indi- 
vidual per generation is /i, then the overall mutation rate will be proportional 
to population size and we see that for large populations the assumption that 
no new mutation will arise during the timecourse of the sweep breaks down. 
Instead one expects multiple overlapping sweeps. In an asexual population mu- 
tations can only be combined if they occur sequentially within the same lineage. 
This means that, on the one hand, alleles occurring on the same lineage can 
boost one another's chance of fixation, but on the other hand alleles occur- 
ring on distinct lineages competitively exclude one another. The net effect is to 
slow down the progress of natural selection. This is an extreme form of the Hill- 
Robertson effect. Hill and Robertson (1966) were the first to quantify the way in 
which linkage between two sites under selection in a finite population (whether 
sexually or asexually reproducing) limits the efficacy of natural selection. In a 
sexually reproducing population, recombination breaks down associations be- 
tween loci and so ameliorates the Hill-Robertson effect, suggesting an indirect 
selective force in favour of recombination. Further quantitative analysis of the 
interference between selected loci is provided by Barton (1995) who considers 
the probability of fixation of two favourable alleles in a sexually reproducing 
population. His method is only valid if the selection coefficient of the first bene- 
ficial mutation to arise is larger than that of the second. Yu et al. (2008) consider 
the same question in the general setting. The conclusion from both works is that 
fixation probabilities are reduced, sometimes drastically, because of interference 
between the two mutations. Furthermore, if the second mutation is stronger 
than the first, then Yu et al. (2008) show that the strength of interference can 
be strongly dependent on population size. In this work we do not consider the 
effects of recombination, since we only work with asexual populations. 

Since all beneficial mutations eventually become either extinct or ubiquitous 
in the population, the rate of adaptation, defined to be the rate of increase of 
the mean fitness of the population, is proportional to /j,spfi X N, where fiN is the 
total number of beneficial mutations that occur to all individuals in the popula- 
tion in a single generation and we assume pfi X to be the same for all beneficial 
mutations, which is the case for the system in stationarity. If pfi X is independent 
of population size, then we expect an adaptation rate of O(N). However, a ex- 
plained above, the occurrence of simultaneous selective sweeps reduces pfi X and 
so Pfi X may not be 0(1). This leads to the following question: if one docs not 
limit the number of simultaneous selective sweeps, what is Pfi Xl or cquivalently, 
what is the rate of adaptation? As N — > oo, is the rate of adaptation finite 
or does it increase without bound? There has been some controversy surround- 
ing this question. Some work (e.g. Barton & Coe 2007) suggests that there is 
an asymptotic limit to the rate of adaptation. Other authors (e.g. Rouzine et 
al. 2003, Wilke 2004, and Dcsai & Fisher 2007) argue that no such limit exists. 
Here we study this problem in a mathematically rigorous framework. 

Previous work on this question has adopted two general approaches: (i) calcu- 
late the fixation probability pf ix directly, and (ii) study the distribution of fitness 
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of all individuals in the population and asks how this distribution evolves with 
time. The first approach was used in Gerrish & Lenski (1998), Wilkc (2004) and 
Barton & Coe (2007). Gerrish & Lenski (1998) were the first to present a quan- 
titative analysis of the rate of adaptation in the presence of clonal interference. 
They obtained approximate integral expressions for the fixation probability of a 
beneficial mutation and thus the expected rate of adaptation. Orr (2000) gener- 
alised the results of Gerrish & Lenski (1998) to include the effects of deleterious 
mutations. Wilke (2004) combined the works of Gerrish & Lenski (1998) and 
Orr (2000) to obtain approximate expressions for the adaptation rate that grow 
logarithmically or doubly logarithmically for large N. In all three works, the 
authors used a sequence of approximations before arriving at an expression for 
the fixation probability or the adaptation rate. It seems to be highly non-trivial 
to turn any of these approximation steps into a rigorous mathematical argument 
and so we do not follow their approaches here. 

The second approach, to consider the distribution of fitness in the popula- 
tion, was used in Rouzinc et al. (2003), Brunet et al. (2006), and Rouzine et 
al. (2007). As in the work described in the last paragraph, Rouzine et al. (2003) 
take fitness effects to be additive, but whereas before the selection coefficient of 
each new mutation was chosen from a probability distribution, now all selection 
coefficients are taken to be equal. In this setting a beneficial and a deleterious 
mutation carried by the same individual cancel one another out and an indi- 
vidual's fitness can be characterised by the net number of beneficial mutations 
which it carries (which may be negative) . Writing Pk for the proportion of indi- 
viduals with fitness equivalent to k beneficial mutations, {Pk}kez forms a type 
of traveling wave whose shape remains basically unchanged over time. The posi- 
tion of the wave moves to the left or the right on the fitness axis, depending on 
whether the adaptation rate is positive or negative. This is similar to traveling 
waves arising from reaction-diffusion equations in the PDE literature (see e.g. 
Chapter 15 of Taylor (1996)). In the current setting, however, the shape of the 
wave actually fluctuates stochastically even after a long time. So the wave can 
be regarded as a stochastic traveling wave, and its speed is proportional to the 
rate of adaptation. Rouzine et al. (2003) studied a multilocus model that does 
not include recombination but does include beneficial, deleterious, and compen- 
sating mutations. They found that the rate of adaptation (i.e. the speed of the 
traveling wave) asymptotically depends logarithmically on population size N, 
which is consistent with results of in vitro studies of a type of RNA virus in 
Novella et al. (1995) and Novella et al. (1999). Rouzinc et al. (2007) presents 
the same approach but with more detailed derivations and improved treatments 
of the stochastic edge. 

Desai and Fisher (2007) also adopts the traveling wave approach. Their 
method of studying the adaptation rate, however, differs from that of Rouzine 
et al. (2003) and Rouzine et al. (2007) in that they consider the fitness variation 
of the population to be in mutation-selection balance, and ask how much vari- 
ance in fitness can the population maintain while this variation is being selected 
on. The conclusion they reach is that this variation (hence the adaptation rate) 
increases logarithmically with both population size and mutation rate. 
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Brunet et al. (2006) study a model in which each of the N individuals in the 
population gives birth to k offspring, each of which has a fitness that differs from 
the fitness of its parent by a random amount and finally the N fittest individuals 
are used to form the next generation. This model resembles artificial selection, 
rather than natural selection, but it may be easier to study because the density 
of individuals of a certain fitness in the next generation has a kind of local 
dependence on that density in the current generation. This is quite different 
from the behaviour considered in Rouzinc et al. (2003) and our work in this 
article, where the density of individuals of a given fitness depends on the whole 
fitness distribution of the parental population. 

This work originally arose from discussions with Nick Barton and Jonathan 
Coe which focused on limits to the rate of adaptation when all mutations are 
beneficial. In reality, most mutations are either neutral or deleterious. In partic- 
ular, if all mutations in an asexual population were deleterious, then the popu- 
lation would irreversibly accumulate deleterious mutations, a process known as 
Muller's ratchet. The first mathematically rigorous analysis of Muller's ratchet 
is due to Haigh (1978). There a Wright-Fisher model is formulated that incor- 
porates the effects of selection and mutation. Again all mutations carry equal 
weight so that individuals can be classified according to how many mutations 
they carry. Haigh (1978) showed that if the population size is infinite (so that 
the dynamics of the model become deterministic) then there is a stationary dis- 
tribution. In the finite population case, however, this is not the case. At any 
given time there is a fittest class, corresponding to those individuals carrying 
the smallest number of mutations, but this class will eventually be lost due 
to genetic drift (the randomness in the reproduction mechanism). This loss is 
permanent since there is no beneficial or back mutation to create a class fitter 
than the current fittest class. The next fittest class then becomes the fittest 
class, but that will be lost eventually as well and the entire population grows 
inexorably less fit. Higgs & Woodcock (1995) derived a set of moment equations 
for Haigh's model but these are not closed and so are hard to analyse. Instead, 
their results rely mainly on simulations. Stephan et al. (1993) and Gordo & 
Charlesworth (2000) use (slightly different) one-dimensional diffusions to ap- 
proximate the size of the fittest class. Etheridge et al. (2007) go much further 
along this line (and provide a more thorough review of the literature than that 
included here). They conjecture and provide justification for a phase transition 
and power law behaviour in the rate of the ratchet. But in spite of the very 
considerable body of work on Muller's ratchet, even a rigorous expression for 
the rate of decline in mean fitness of the population remains elusive. 

Muller's ratchet caricatures the evolution of a population in which there is 
no recombination and no beneficial mutation. Such a population is doomed 
to become progressively less and less fit. So how can a species overcome the 
ratchet? If it reproduces sexually, then recombination of parental chromosomes 
can create offspring that are fitter than either parent and so Muller's ratchet 
has been proposed as an explanation for the evolution of sexual reproduction 
(e.g. Muller 1964, Felsenstein 1974). But not all populations reproduce sexually. 
Another mechanism which has the potential to overcome Muller's ratchet is the 
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Fig 1. Adaptation rate against population size, from top to bottom, for q = 4%,2%,1% and 
0.2%, )i = 0.01, and s = 0.01. Circles represent data points obtained from simulation, q 
is the probability that a mutation is advantageous, and vertical bars represent one standard 
deviation. 



presence of beneficial mutations, and it is this mechanism that we shall consider 
here. More specifically, we pose the following question: with both beneficial and 
deleterious mutations, does a sufficiently large population overcome Muller's 
ratchet? 

The conclusion we reach, through both non-rigorous ($3]) and rigorous (The- 
orem I4.6j) approaches, is the following: as long as the proportion of beneficial 
mutations is strictly positive, the rate of adaptation is roughly O(logTV) for 
large N, where N is the population size and time is measured in generations. 
This shows that even with a tiny proportion of beneficial mutations, a large 
enough population size will yield a positive adaptation rate, in which case the 
entire population grows fitter at a high rate and Muller's ratchet is overcome. 
It also shows, in particular, that the rate of adaptation grows without bound as 
N — > oo in the all-mutations-beneficial case. This is consistent with the findings 
of Rouzine et al. (2003) and Wilke (2004). 

Figure [1] plots the adaptation rate against log population size from simulation 
results of the model we consider in this article. We observe that for each set of 
parameters q, /i and s, the rate of adaptation is roughly proportional to logN 
and small population sizes may result in negative adaptation rates. Furthermore, 
larger q results in a higher adaptation rate for fixed p, and s. The upshot is that 
with [i and s held constant, a smaller proportion of beneficial mutations needs 
a larger population size for Muller's ratchet to be overcome. 

In the model we study here the selection coefficient s is held fixed as N — ► oo. 
This is known as a 'strong selection' model. Our interest is in the behaviour of 
the model for very large N. It is not clear in this setting how to pass to an 
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infinite population limit and so we must work with a model based on discrete 
individuals. An alternative model, the so-called weak selection model, is used 
to address behaviour of very large populations when Ns is not too large. By 
fixing Ns (as opposed to s) one can pass to an infinite population limit. The 
limiting model comprises a countably infinite system of coupled stochastic dif- 
ferential equations for the frequencies of individuals of different fitnesses within 
the population. Preliminary calculations for this model are presented in Yu and 
Etheridge (2008). 

This work is organised as follows. In <J2] we formulate our model. In the 
biological literature one would expect to see a Wright-Fisher model, but since 
we are interested in large populations, we expect the same results for the much 
more mathematically tractable Moran particle model which we describe. We 
also perform some preliminary calculations. In we present a non-rigorous 
argument that leads to an asymptotic adaptation rate of roughly (log AT). In 

we present and prove our main rigorous result that establishes a lower bound 
of log 1-15 N for any S > on the adaptation rate. And finally in £JH we prove 
the supporting lemmas required for the proof of our main theorem. 

2. The Finite Population Moran Model 

We assume constant population size N. For each N £ N, let Xi(t) € Z, i = 
1, . . . , N, denote the fitness type of the i th individual, defined to be the number 
of beneficial mutations minus the number of deleterious mutations carried by 
the individual. For k E Z, let Pk(t) denote the proportion of individuals that 
have fitness type k at time t, i.e. 

1 N 

We use ^ W (Z) to denote the space of probability measures p on Z formed by 
N point masses each with weight 1/N, and define 

S (N) = pW(z) 

to be the state space for Pfc(t). For p £ S^ N \ we define p k = p({k}) and 

I 

P[k,l] = 

i—k 

m(p) = (k,p) = ^kp k 

kez 

Cn(p) = ]T(fc-m(p))> fc . (1) 

fcez 

In particular, m(p) is the mean fitness of the population, and Ca(p) = (fc 2 ,p) — 
(k,p) 2 is the 2 nd central moment of the population fitness, i.e. its variance. We 
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sometimes abuse notation and use P to denote the probability mass function of 
different fitness types associated with the probability measure P. 

The model of interest is one where each individual accumulates beneficial 
mutations at a Poisson rate q/i and deleterious mutations at rate (1 — q)fi. We 
assume a so-called infinitely-many-loci model where each mutation is assumed to 
be new and occur at a different locus on the genome. All individuals experience 
selection effects via a selection mechanism (which introduces a drift reflecting 
the differential reproductive success based on fitness) and the effect of genetic 
drift via a resampling mechanism. The mechanisms of this model are described 
below: 

f . Mutation: For each individual i a mutation event occurs at rate /i. With 
probability 1 — q, Xi changes to JQ — 1 and with probability q, Xj changes 
to Xi + 1. 

2. Selection: For each pair of individuals at rate jf{X^ — Xj) + , indi- 
vidual i replaces individual j. 

3. Resampling: For each pair of individuals at rate -k, individual i 
replaces individual j. 

This model has a time scale such that one unit of time corresponds roughly to 
one generation. A more sophisticated model should consider mutations that have 
a distribution of fitness effects, e.g. an independent exponentially distributed se- 
lective advantage associated with each new beneficial mutation as proposed by 
Gillespie (1991). Recent work by Hegreness et al. (2006), however, suggests that 
in models where beneficial mutations have a distribution of fitness advantages, 
evolutionary dynamics, e.g. the distribution of successful mutations which ul- 
timately determines the rate of adaptation, can be reasonably described by an 
equivalent model where all beneficial mutations confer the same fitness advan- 
tage. One can also describe the mechanisms in the above model in terms of the 
Pfe's, 

f . Mutation: for any k £ Z, at rate (l — q)nNPk, Pk decreases by and Pk-x 
increases by at rate q^NP^, Pk decreases by j? and Pfc+i increases by 

2. Selection: for any pair of k, I £ Z such that k > I, at rate s(k — l)NPkPi, 
Pk increases by and Pi decreases by . 

3. Resampling: for any pair of k,l £ Z, at rate NP^Pi, Pk increases by 
and Pi decreases by -4. 

We use (P, X) to denote the process evolving under the above mechanism, where 
X describes the fitness types of the exchangeable individuals and P describes 
the empirical measure formed by the fitness types of these individuals. If there 
is no confusion, we drop X and denote the process simply by P. The main result 
of this work, Theorem l4.6[ states that under the above model, the mean fitness 
increases at a rate of at least ©(log 1- * 5 N) for any S > after a sufficiently long 
time. 

Remark 2.1. Notice that the resampling acts on ordered pairs, so that the 
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overall rate at which an individual is affected by a resampling event is 2N and 
at such an event it has equal chance of reproducing or dying. It would be more 
usual to have resampling at half this rate, but this choice of timescale does not 
change the results and will save us many factors of two later. 

Often one combines the resampling and selection into a single term. Each 
pair of individuals is involved in a reproduction event at some constant rate and 
the effect of selection is then that it is more likely to be the fitter individual that 
reproduces. Since s is typically rather small, our simpler formulation is a very 
small perturbation of this model and again the statement of our results would 
not be changed in that framework. 

Remark 2.2. We take the selection mechanism to be additive instead of mul- 
tiplicative, i.e. the fitness type of an individual with k beneficial mutations is 
1 + sk instead of (1 + s) k . Even though (1 + s) k w 1 + sk is only valid for small 
s and k, (1 + s) k > 1 + sk holds for all s e [— l,oo), thus our main result of 
a lower bound on the rate of adaptation also holds for multiplicative selection 
effects. 

One can construct the process X(t) using Poisson random measures and 
Poisson processes. More specifically, let I denote the Lebesgue measure on R. For 
each i 6 Z, let an d be independent Poisson processes with intensities 
qfi and (1 — q)fi, respectively. For each i,j e Z, let A^ be a Poisson random 
measure on M + x M+ with intensity measure jjl x I. And, for each i,j G Z, 

let A.^j be a Poisson process with intensity jj. Then X { satisfies the following 
jump equation: 



In the above, jumps of A^ ■ give possible times when the type of individual 
i is replaced by that of individual j due to the resampling mechanism; jumps 
of A^ give possible times when the type of individual i is replaced that of 

individual j due to the selection mechanism; and jumps of A^ and A^^ give 
possible times when the type of individual i increases and decreases by 1 due to 
the beneficial and deleterious mutation mechanisms, respectively. 
In terms of P k , we have 





(2) 
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where M P ' Y and M P ' 2 are orthogonal martingales, the hrst arising from the 
(compensated) mutation mechanism and the second from the resampling and 
selection mechanisms. 

We define the conditional quadratic variation of an L 2 -martingale (M t )t>o to 
be the unique previsible process (M)(t) that makes M(t) 2 - M(0) 2 - (M)(t) a 
martingale. See e.g. Chapters II. 6 and III. 5 of Protter (2003). With this notation, 
following the method of, for example, Ikeda & Watanabe (1981), §11.3.9 we 
obtain 

(M^)(t) = J± J qPk-^u) + P k (u) + (1 - q)P k+1 (u) du 

(M P '\ M^\)(t) = - £ J 1 qP k -i(u) + (1 - q)P k (u) du 
(M P '\M pi )(i) = if |fc Z| > 2 

(M p2 ,M p2 )(i) = -1 j {2 + s\k - I|)J\(t0fl(u) du 

if k ^ I. (4) 

With the expressions in ([3]) and ([4]), we can write the martingale decomposi- 
tion of the mean m(P(t)) = J^k kPkif) in the notation of (p} as follows 

m{P{t)) = m(P(0))+/x f y2k[qP k _ 1 (u)-P k (u) + (l-q)P k+1 (u)]du 

J ° k 

+s { J2 k ( k ~ l)Pk{u)Pi{u) du + M p ' m {t) 
J° k,iez 

= m(P(0))+fi(2q-l)t + s [ c 2 (P(u)) du + M P ' m (t) 

Jo 

where M P ' m is a martingale, or in differential notation, 

dm{P) = (p(2q-l) + sc 2 (P)) dt + dM P ' m . (5) 

3. A Non-rigorous Argument 

In this section, we give a non-rigorous argument that leads to an asymptotic 
adaptation rate of roughly C(log N), as long as q is strictly positive and regard- 
less of the selection and mutation parameters. A rigorous argument in §3] will 
establish a lower bound of ©(log 1-5 N) on the adaptation rate. 

Our non-rigorous approach is similar to that of Rouzine et al. (2003). We 
assume the 'bulk' of the wave, i.e. at k's not too far away from the mean fitness, 
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behaves like a deterministic traveling wave and obtain an approximate expres- 
sion for the shape of this wave. More specifically, we obtain a set of equations 
satisfied by all central moments of the distribution P, which will dictate that the 
wave is approximately Gaussian. There is, however, an infinite family of solu- 
tions to these equations, parameterised by the variance of P, which ultimately 
determines the wave speed. To determine the correct wave speed for a given 
parameter set (i.e. population size, mutation and selection coefficients, and the 
proportion of beneficial mutations), we use the essentially stochastic behaviour 
at the front of the wave to calculate the wave speed. The answer we obtain from 
both calculations, i.e. using the 'bulk' and the front of the wave, must be the 
same. This constraint will yield an approximate expression for the adaptation 
rate. 

With all martingale terms in © of order P/N, the effect of noise on Pk can 
be considered to be quite small if Pk is much larger than 1/N. For k's where Pk 
is in this range, we have from (|3|), 



dP k 



H{qP k -l - P k + (1 - q)P k +l) + sJ2( k - l ) P kPl 



lei 



dt 



MqPk-i -P k + (1- q)Pk+l) + a(k - m{P))P k ] dt. 



(6) 



This is similar to Equation (2) in Rouzine et al. (2003). 

If we assume that {Pk}kez evolves according to this deterministic system, 
then we can calculate the central moments via the Laplace transform ip(9;p) 

E* 



d(k-m(p)) 



Pk- 

#(6») 



,0(k-m(P)) 



dP k - 6e e(k - m{p)) P k dm{P). 



Furthermore, we can obtain from ([5]) 

dm(P) fa (fi(2q - 1) + sc 2 (P)) dt. 



Therefore 
#(<?) PS 



M ^ e fHfc- m (P)) ((?jFVi _ Pk + (1 _ q)Pk+i) 



9(fc-m(P))/, _ 



(k - m(P))P k 



(7) 



Y2e e e(k-™(P))p k ( fl (2q-l) + sc2(P)) 



dt 



= [t/>(e)(n(qe e - 1 + (1 - q)e- e ) - 9{^{2q - 1) + sc 2 (P))) + m//(B)] dt 
= [^j{6){n{qe e - 1 + (1 - q)e- 6 - 9(2q - 1)) - 9sc 2 {P)) + Slp'(9)] dt. 

We observe that the term with coefficient /x is 0(9 2 ), thus for small 9, the effect of 
the mutation mechanism on the centred wave is relatively small compared to the 
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selection mechanism. We drop the terms arising from the mutation mechanism 
to obtain 

#(0) « s [-iP(6)6c 2 (P) + tf(0)] dt. 

Differentiating this repeatedly and using the fact that c n (P) = ^ (0; P) for 
n > 2 we obtain the following approximate system for the central moments c n : 

dc n (P) « s(c„+i(P) - nc^P^P)) dt. 

If we assume the shape of the wave to be roughly deterministic and stationary, 
then setting the expressions on the right hand side to zero we see that the central 
moments of P satisfies 



Cn(P) - { Li^n/a 



0, if n > 3 is odd 

if n > 2 is even 



2™n! 

which are the central moments of normal distribution with variance C2(P). Hence 
P is approximately Gaussian, but the variance is not determined. 

We can use this information to guess at the asymptotic variance of the wave, 
which will also, through Equation (J7J) yield an expression for the asymptotic 
rate of adaptation. We follow §3 of Yu & Etheridge (2008) and assume that P 
is approximately Gaussian with mean m(P) and variance b 2 , and the 'front' of 
the wave is approximately where the level of P falls to \/N . If the front of the 
wave is at K + m(P), then 

1 -K 2 /2b 2 = J_ 



2nb 2 N' 
hence 

K w b^f2\ogN. (8) 

To estimate how long it takes the wave to advance by one, we suppose that a 
single individual is born at K + m(P) at time zero and estimate the time it 
takes for an individual to be born at K + m(P) + 1. Let Z(t) be the number of 
individuals at site K + m(P) at time t. Note that these are the fittest individuals 
in the population. According to ([B]), until a beneficial mutation falls on site 
K + m(P), Z(t) increases exponentially at rate sK — [i. Ignoring beneficial 
mutations occurring to type K — 1 + m(P), i.e. 

Z(t) « e (' K -ti*. (9) 

As the population at site K + m(P) grows, each individual accumulates bene- 
ficial and deleterious mutations at rates q\x and (1 — q)fi respectively. The occur- 
rence of the first beneficial mutation will result in the advance of the wavefront. 
Using (|9|) , we deduce that the probability that no beneficial mutation occurs to 
any individuals with fitness type K + m(P) by time t is 

expj-Q/xjJ Z{u) du\ =exp j--^— (e< sK -^*-l) 
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which gives a wave speed of (sK — fi)/log(sK — //). 

Now we equate the results of our two calculations for the wave speed. By ([7]), 
the wave speed is /j,(2q-l)+sc 2 {P) = fJ,(2q-l)+sb 2 w n(2q-l)+sK 2 /(21ogiV), 
using the equality involving K and b in ([8]). This leads to the following consis- 
tency condition: 

sK — a , „ . sif 2 

-M2«-i) ' 



log(sif-/i) ^ H ' 21ogiV' 

For large -ftT, this approximately reduces to 

Klog(sK) = 21ogiV. (10) 

It is easy to see that K must be smaller than log N but larger than any fractional 
power of logiV. In fact, (flT)]) is a transcendental equation whose solution can be 
written as K = iiy(-/V 2<T ), where W(,z) : [0, oo) — > [0, oo) is the inverse function 
of z ze z . Corless ei aZ. (1996) calls the function W the Lambert W function, 
and gives useful asymptotic expansion results of this function near and oo, e.g. 
Equation (4.20) of Corless et al. (1996). In particular, the two leading terms of 
this expansion are 

W(z) = log z — log log z + . . . , 

which shows that K — 21ogiV — log(21og N) + . . . and the leading term of the 
wave speed is 2a log TV/ (log log N ) . Our rigorous results in [g]will show that the 
rate of adaptation is asymptotically greater than any fractional power of log N 
as N — > oo. 

There are two critical components in the non-rigorous argument that we 
presented in this section: (i) the Gaussian shape of the wave when N is very 
large, and (ii) the relation between the speed of the mean and that of the 
front of the wave. The second component above has a rigorous counterpart in 
Proposition ^. 2i but we have found it difficult to give a rigorous statement of the 
shape of the wave that we can prove and use, therefore our rigorous arguments 
in Sj4]does not rely on the first component of the non-rigorous argument. What 
takes its place is a comparison argument between the selected process and the 
neutral process with only the mutation and resampling mechanisms. 



4. Stationary Measure of the Centred Process 

If 2g — 1 > 0, then the distribution P tends to move to the right by mutation and 
the selection mechanism also works to increase the mean fitness, therefore no 
stationary measure for P can exist. If 2g — 1 < 0, the mutation mechanism works 
to decrease the mean fitness but it is not at all clear the selection mechanism 
can keep the effects of deleterious mutations in check and maintain a 'mutation- 
selection balance'. However, the process centred about its mean does have a 
stationary measure and our first task in this section is to establish this. Define 

Pk = Pk+mip) 
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for p £ and k S Z/N, so that m(p) = for all p. Define 

SW = {p : there is some p € SW such that 

Pk = Pk+m( P ) for all k E Z/N}. 

We observe that every p 6 §W has all its mass on points spaced 1 apart 
and furthermore, the centred process P is irreducible, i.e. all states in S^ N ' 
communicate. To get from state any f>\ to any p 2l it suffices to first get to a 
state where all individuals have the same fitness type. For example, the following 
event ensures that at time t + h, all individuals will have the same number of 
mutations as carried by individual 1 at time t: 

max A <f> (*, t + h}+ A<?>(f , t + h] + {t,t + h]=0 

minAl^ ) (t,t + /i] > 1, max V A^ } (t, f + /i] = 0. (11) 

3>1 

Then one can get to any configuration in a^> by the mutation mechanism 
alone. The fact that the event in (fTTj) has positive probability also ensures that 
the centred process is positive recurrent. By standard results, e.g. Theorem 3.5.3 
of Norris (1997), the centred process P is ergodic. 

Proposition 4.1. The centred process (P,X — m(P)) is ergodic, i.e. there is 
a unique stationary measure n and regardless of initial condition, the chain 
converges to the stationary measure as t — > oo. 

From now on, we take 

§( N ) = {p : there is some p G 5 (Ar) and I £ Z/N such that 
Pk = Pk+i for all k e Z/N} 

to be as our state space for the process P because we may wish to start the 
process with an initial configuration that has all its mass spaced 1 apart but 
not necessarily falling onto Z. Let E" denote the expectation started from the 
stationary measure it. Let T(t) be the semigroup associated with the process 
(P, X), then since 

EP[cb(P(«))] dv{p) = [ T(u)c 2 (p) dn(p) = [ c 2 {p) dv{p), 



we have 



W[m{P{t))} = J J E P H2q - 1) + sc 2 (P(u))} dn{p) du 

= n(2q-l)t + sj J E p [c 2 (P(u))} dir(p) du 

= (/i(2g-l) + .sE^c 2 ])i. (12) 
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Thus it suffices to estimate E 7r [c2] in order to get a handle on the asymptotic 
speed at which m(P) increases. Such an approach resembles the one taken by 
Desai and Fisher (2007). However, we have found it difficult to estimate E^fca]. 
Instead, we use the relation between the speed of the mean and that of the front 
of the wave. For that, we define 

k c {p) = max{fc : Np [k>oo) > log 2 N}. (13) 

which we view as the location of the front of the wave. Since fc c (p)— m(p) = k c (j>), 
we arrive at the following 

Proposition 4.2. For all t > ; with the stationary measure it of the centred 
process as the initial condition for the non-centred process P , we have 

W[k c {P{t)) - k c (P(0))} = F[m(P(t))], 

Roughly speaking, the above proposition states that the speed of front of 
the wave is exactly the same as that of its mean, which seems to be obvious 
if the wave is of fixed shape. In the present setting, however, the shape of the 
wave is stochastic and this equality holds under the stationary measure of the 
centred wave. The idea of relating behaviour of the mean and the front of the 
wave has been used in our non-rigorous argument in as well as in Rouzine 
et al. (2003). The idea of our main theorem, Theorem 14.61 below, is to start 
the process P(t) from the stationary measure of the centred process and obtain 
an O(log 1_(5 N) lower bound for the mean fitness of the population by time 1, 
as long as the proportion of beneficial mutations q is strictly positive. In this 
case, for large enough population sizes, the mean fitness of the population will 
increase at a rate roughly proportional to log N. All results in what follows are 
valid for sufficiently large N, which we may not explicitly state all the time. 

We first state three results that are needed for the proof of Theorem l4.6l below. 
Lemma T4.3I gives estimates on how far k c (P) can retreat on sets of very small 
probabilities, while Lemma 14.41 compares the selected process with a neutral 
process to establish that if the population starts at time with at least M 
(whose value range is specified in Lemma 14.41) individuals with fitness types 
> Kq , then the population is expected to have at least log 2 N individuals with 
fitness types 

>fo + 1.81og 1_f M, 

where we observe that C fJ _(y // C2(p)N 3 + N 2 )e~ Me ( +M> / 4 m the statement of 
Lemma l4~4l is a very small correction factor. Hence fc c (P(l)) is expected to be at 
least Kq + log 1_e M. Finally, Proposition ^. 51 states that for any initial condition 
p G S( N \ the front is expected to advance at least 1.7 log 1-5 ' 3 N minus a small 
correction factor. 

Lemma 4.3. Let AW C S ( ~ N) . Let S & x & e an event that satisfies 

pf(SW) < e (N) for all p G where e(N) -> as N -> oo. If N is 
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NU 



efficiently large, then for any p € 



E" 



inf min pQ(t) - J5Q(0))l sOT 
te[o,x] j=i,...,iv 



> -^(^(pJJVa + JV^e^) 1 / 2 , 



where C M is a constant depending only on fj,. In particular, if N is sufficiently 
large, then for any p £ 



E p 



inf (fc c (P(t)) - fcc(p))l B w 
te[o,i] 



> 



-C^y/^W + N^eiN) 1 ' 2 . 



The result still holds if we replace the process (P, X) by the neutral process 
defined in (TTJ). 

Lemma 4.4. Lei ti e [1/2,1], K € Z/JV, and e e (0,1) &e /raed. Let M = 
M(N) be a constant that depends on N such that 



M 



log z 7V 



00 



as N —> oo. ijf JV is sufficiently large, then for any p € with P[k„,oo) > 

M/N , we have 



inf fc(P(t)) 

t£[ti,l] 



> 



K C, (v^b)A^ + N 2 ) e -Me-'^/A + L8 log i-e M _ 



+ N A ) e~* 



Proposition 4.5. Let [i > 0, q > and s > be fixed. If N is sufficiently large, 
then for any [3 > and p €E 

W[k c (P(l)) - k c (p)} > lJlog 1 - 5 ' 3 ^ 

-C„ (Vc 2 (p)^V 3 + [Vc 2 (P(to))^V 3 

w/iere i = - log"' 3 iV. 

Theorem 4.6. Let /i > 0, q > and s > be fixed. Then for any (3 > 
E*[m(P(l))] > log 1 " 6 ' 3 TV 

i/ AT is sufficiently large. 

Proof. We combine Propositions 14.21 and 14.51 to obtain 
E*[m(P(l))] = E"[fc c (P(l)) - fc c (p)] 

> 1.7 log 1 " 5 ' 3 A^ - C M (n 3 ' 2 E* [^2"+ V / c 2 (P(io))] +A^ 2 )e-5i°g 

= 1.71og 1 ~ 5/3 Cf, (2N 3/2 E 7 '[^\+ N 2 ^j e^ lo; 
But from (fl2")) . we have 

E*[m(P(l))] = M (2g - 1) + sE^ca]. 



■ log- 1 N 



N 
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Hence 

(s + 2C M iV 3 / 2 e-5 lo s 2 N )W[c 2 ] > 1.7 log 1 " 5 ' 5 JV - fi(2q - 1) - Cj.JVV* Iog2 * 
which implies that 

E^c,] > — log 1 " 5 ' 3 TV 
s 

for sufficiently large N. The desired result follows. □ 

The rest of this work is devoted to the proof of Proposition I4.5[ which makes 
use of Lemmas 14.31 and 14.41 We define 

L = log 1 " 3 ' 3 N 
k d (p) = max{fc:iVp [fc!Co) >e lo « W,A '}. 

The number of individuals beyond k d , e log N , is much larger than the number 
beyond k c (which is log 2 N) but nevertheless is only a tiny proportion of the 
entire population. The basic idea for the proof of Proposition 14.51 is to use 
Lemma l4~4"l which states that if there are M individuals with fitness types larger 
than Kq at time 0, then k c {P) is expected to be beyond Kq + 1.8 log ~ e M at 
time 1, where the value of e does not depend on M as long as M is sufficiently 
large. We can then divide into 2 cases: (i) if k d (P) > k c (p) — L before some small 
time to (event B\ U B 2 below), and (ii) if kd(P) < k c (p) — L throughout the 
time interval [0,<o] (event B 3 U B4 below). Under case (i), a simple application 
of Lemma 14.41 implies that the e log & N individuals with fitness types larger 
than k c (p) — L are expected to push k c (P) to beyond k c (p) — L + 2L at time 
1, hence advancing k c (P) by at least L. Under case (ii), the log 2 N individuals 
with fitness types larger than k c (p) will pick off individuals with fitness types 
smaller than k c (p)~L (of which there are at least JV — e logl N ) via the selection 
mechanism at a very fast rate so that with very high probability by time to, 
P\k c (p),oo){to) wm be at least e logl if> N . Lemma B~4l implies that these e logl i0 N 
individuals will then push k c (P) forward by at least e logl ^ N by time 1. In 
either case, the front of the wave moves forward at a high speed. 

Proof of Proposition 14.51 We take to = - log"' 3 N and define 

T = inf{< > : k d (P(t)) > k c (p) - L} 

B 1 = {P lkaip) - L , oo) (t )>e lo ^ 2 " N ,To<to} 

B 2 = {P [fec ( P )-L,co)(to)<e log " 2 " Ar ,T <t } 

B3 - {P[ kc ( P ).o )(to)>e losl ~ i " N ,To>t } 

Bi = {P[ kc ( P ),oo)(to)<e los " 4(iN 1 To>t }. 

We will estimate W[(k c (P(l))-k c (p))l B ] for B = and B = B 2 UB 4 . For 

p £ §W with k d (p) > k c (p) ~L,T = 0. But for those p with k d (p) < k c (p) - L, 
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we need to establish that the number of individuals lying in [k c (p),oo) grows 
quickly, i.e. 7? 4 has small probability. For that, we construct a set valued process 
7 to be dominated by the set of individuals lying in [fe c (p),oo), i.e. such that 
7(i) C {i : Xi(t) € [fc c (p),oo)} for all t < T . Without any loss of generality, 
we assume that at time individuals {1, . . . , log 2 N} lie in [k c (p), oo) and define 
7(0) = {1, . . . ,log 2 N}. The mechanisms that drive the population P have the 
following effect on 7: 

1. Mutation: if any individual i £ 7 is hit by a deleterious mutation event, 
we delete i from 7. 

2. Selection: at a selection event when individual i E I replaces individual j 
lying in (— oo, kd(P)] at time t, we add j to 7; at a selection event when 
individual i g 7 is replaced by individual j £ I at time t (in which case 
Xj > Xi), we replace i € I with j. 

3. Resampling: at a resampling event when individual i replaces individual 
j at time t (which happens at rate jj), Hi £ I and j € I then we delete 
j from 7; if i € 7 and j (fc I then we add j to 7. 

Then for i S 7(i), we have Xj(i) € [fc c (f>),oo), and |7| has the following transi- 
tions: 

1. Mutation: |7| decreases by 1 at rate /z(l — q)\I\- 

2. Selection: |7| increases by 1 at rate Yliei,j-.x ■<kd(P)^ i ~ + ■ 

3. Resampling: |7| increases by 1 and decreases by 1 both at rate |7| . 
Prior to To, we have 

± E (^-^) + * £ E ^^HKiV-e^^ 

i&I,y.X } <k d (P) i£I,j:Xj<k d (.P) 

> 0.9s\I\L 

for sufficiently large AT. 

Let Z be an integer valued jump process with initial condition Z(0) = log 2 N 
and the following transitions: 

1. Z increases by 1 at rate 0.9sLZ 

2. Z decreases by 1 at rate (/i + l)Z, 

then Z is dominated by |7| before To- By Lemma I572"b , if we take to = - log _/3 AT, 
which is > o 9s£ 1 _ M _i (log + 1 °g 1 ~ 4 ' 3 N ) for sufficiently large N, then 

W(7(t \ < pW-^iV^ < 1 ( 4 (M + 1) V° S N 

p(z(<o) - e } - (1 _ e - logl -^ )el „ gl ^„ \-^r) 

< Ce (log 2 Af)(logC-logL) 

< Ce^ log2jv . 

Since |7| dominates Z (i.e. |7(f)| > Z(t)) and 7 is dominated by the set of 
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individuals lying in [k c (p),oo) before To, we have 

P P (B 4 ) < Ce- log2jv 



(14) 



for ah>e SW. 

Now we turn to the event B 2 . Without any loss of generality, we assume at 
time T , individuals in A = {1,2, ... , [e logl N ~\} have fitness > k c (p) — L. 
During the time period [To, to]) the number of resampling events where indi- 
vidual i G Aq gets replaced by another individual is Poisson(^^-(tQ — To)), 
so Yi remains untouched by a resampling event during [0, 1] with probability 
> er 1 . Furthermore, no deleterious mutation event falls on Yi during [To, to] 
with probability e^-aM^-To) > e - M _ Let 



-4i 



{i G Ao : X{ remains untouched by a resampling event 
or a deleterious mutation event during [To, to]}, 



/2 



(15) 
(16) 



then I Ai | dominates Binomial(\e l ° e N ~\,e ( 1 +A')), Bv Lemma l5.1b . ifp[ar 0|0O ) > 
M/N, then 

Iob- 1- & N —2 

P P {B 2 ) <e- e 

Combining this and ()14|1 implies 

P P {B 2 (JB 4 ) < Ce- los2N . 

Hence by Lemma 1431 we have, for any p G 

E"[(fc c (P(l)) - k c (p))l B2UBi ] > -C, (y/c 2 (p)N3 + N 2 ) e~^ N . 

Finally we turn to events B\ and S3. Both these two events, unlike B 2 and B4, 
will turn out to make large and positive contribution to the rate of adaptation, 
and even though we have no estimates on their probabilities, we expect neither 
to tend to as TV — > 00. On B\, there are more than iVe log N individuals in 
[k c (p) — L, 00) at time to. And on P3, at time to, there are more than iVe logl lf< N 
individuals in [k c (p), 00), therefore for any p G 



E" [(fc c (P(l)) - fc c (p))l Bl 



UB3J 



w [{W[k c {P{i))\& t0 \ - k c {p)} i BlUB3 



= E p 
> E p 



1UB3 



E p ^[k c (P(l-t ))]-k c (p)}l Bl 

L-Cp (Vc 2 (T(to))iV 3 + iV 2 ) e -^- 2(1+ "V8 + L8 log i-/» I 

-C, Uc 2 {P{t )W + NA e -e-— e-^)/4 



+1.81og i_p (e 



1-/3/ log 1 - 4 " JV- 



> 1.8{log 1 - 5l3 N)P p {B 1 (JB 3 )-C t ,e~ los ' N E p ^/c 2 (P(t ))N 3 + 
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where in the first >, we use Lemma l4~4l twice, with K$ = k c (p) — L, M = N/2, 
and e = (3 for event Pi, and with i-Q) = k c (p), M = e logl 4 iv , and e = (3 for 
event S3. 

We combine the two estimates above to obtain that if TV is sufficiently large, 
then for any p £ 

W [k c (P(l)) - k c (p)] 

= W [(fe c (P(l)) - fc c (p))l BlU B 3 ] + E P [(fcc(P(l)) " fc c (p))l S2 uB 4 ] 

> 1.8(log 1 - 5/3 7V)P p ((B 2 U P 4 ) c ) - C^e- log2 w E p [vM^oM 3 + N 2 



> l.7{\oi-^ N) - (y^(pjN~ 3 + E p [^/c 2 (P(t ))N^ + N 2 ) e 



e 

2\ -±log 2 JV 



where we use (|16[) in the last inequality. Hence we have the desired result. □ 



5. Proof of Supporting Lemmas 

The lemmas in this section are needed for the proof of Proposition ^. 51 Lemma l5"TTl 
gives large deviation estimates for the binomial and Poisson random variables. 
Lemma 15.21 establishes a few results on a birth-death process, which we will 
use to show that fit individuals pick off unfit individuals very quickly via the 
selection mechanism. We then prove Lemmas 14.31 and 14.41 

Lemma 5.1. (a) Suppose Z ~ Binomial(n,'y), then P(Z < nj/2) < e~™ 7 I 2 . 

(b) Suppose A > is fixed and Z ~ Poisson(X), then P(Z > n) > e ^ 2 J" (^)"- 

In particular, if e > is fixed and P(Z > 2 log 1- ' M) > c (1) exp(- log 1-0 96 M) 
for some constant C(i) and sufficiently large M . 

(c) Suppose Z ~ Poisson(Nn), then P(Z > N 2 ) < Ce- Nlo % N . 

Proof, (a) We use Hoeffding's inequality (Hoeffding 1963) to prove this: 

Let X\, . . . , X n be i.i.d. random variables taking values in [a, b]. Let 

U = Xi + . . . + X n and t > 0, then P(U - E[U] > nt) < e -^t 2 /(b-a)\ 

We regard the binomial random variable n — Z as a sum of n independent 
Bernoulli{\ — 7) random variables, then 

P(Z<n 7 /2) = P(n - Z > n(l - 7/2)) 

= P((n - Z) - n(l - 7) > n(l - 7/2) - n(l - 7)) 

= P((n - Z) - n(l - 7) > 717/2) 

< e r n ^' 2 



by Hoeffding's inequality. 
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(b) By Stirling's formula (see e.g. page 257 of Abramowitz & Stegun 1965), 
k\ < v / 27rfc fe+3 e _fe+ 2fc for any integer k. Therefore for n > 1, 

00 e -X\k 00 \fe 

p(z >n) = y e —±- > e - x y —A - ; 

k—n k—n 



We take n = 21og 1_e M, then for sufficiently large M, 



V2tF WJ ~ (4A- 1 log 2 - 2£ M) 21o g 1 " £M 

C(l) 



cm exp{-(log 1 - a9e M)(21og- ale M)(log(4A- 1 ) + (2 - 2e) loglogM))} 



exp{(21og i - £ M) log(4A^ 1 log^ e M)} 
"(i) 

> c^expC-log 1 " - 9 ^). 
(c) We take n = N 2 , then 

P(Z = n) = e -^i^l < f < C ( 1-Y = Ce~ N ^ N , 

where we apply Stirling's formula k\ > \Z2nk k+ ^e~ k > c\fk(kje) k . Conse- 
quently 

P{ z>n) = e ^yM< e ^M:y(V t 

k=n k=0 v 

= P(Z = n) n <Ce- Nl °z N , 

as required. □ 

Lemma 5.2. Let Z be an integer valued jump process with initial condition 
Z(0) = Z > and the following transitions: 

1. Z increases by 1 at rate aZ 

2. Z decreases by 1 at rate bZ , 

where a, b > and a^b, then 
(a) Forxe [0,1), 



G(x,t) = E{x Zt ) = 



b{x - 1) - {ax - b)e-( a - h » \ Z ° 



K a(x- 1) - (ax - 6)e-( a - 6 )* / 

(b) If a > b, M > 1 andt> (log 2 V log(aM/6))/(a - b), then P(Z(t) < k) < 
_ l 

(1-1/M) k 



_ 1 (4b\ Za 
(1-1/M) k \ a ) 
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Proof, (a) It can be shown that G(x, t) satisfies 

^G(x, t) = (ax -b)(x- l)^G(x, t) 

and that the given G(x, t) satisfies this PDE with initial condition G(x, 0) = x z ° ; 
see for example Theorem 6.11.10 in Grimmett & Stirzaker (1992). 
(b) We take x = 1 — 1 /M and apply Markov's inequality to obtain 

PW)<k) - P^>*<*«fcWp. 

+ (a(M - 1) - bM)e^ a - b ^\ Za 



< 



(1 - l/M) k \a + (a(M - 1) - bM)e-( a ~ b ) t 
1 fb + aMe-^-^ 



(1 - 1/Af) fe V a - ae-( a - b ) f 



where in the last inequality, we use the assumptions M > 1 and a > b to deduce 
that (aM - bM)e- ( - a - b '> t > 0. Since t > (log 2 V log(aM/b))/(a - 6), we have 
flMe-*"-'' 1 < 6 and ae"^" 6 )* < a/2. Therefore 



P(Z(i) < fc) < 



1 /4& N z " 



(1 - 1/M) fe V a 

as required. □ 

Before we prove Lemmas I4.3l and [4. 41 we first construct a process Y consisting 
of individuals that undergo the mutation and resampling mechanisms of $2] but 
not the selection mechanism. Let Yi(t) G Z/iV, i — 1, . . . , N, denote the number 
of mutations present in the i th individual in the population, then 

Yi(f) = Yi(0)+ [\W( du) _ [\W( du) 



\ ''(Y^-Yiu-M^idu). (17) 



3 







Let p( Y ) (t) be the empirical measure formed by the N individuals of the process 
Y. Since we use the same Poisson random measures and Poisson processes to 
construct X and Y, we have Yi(t) < Xi(t) for all t > and i = l,...,N, 
provided Yi(0) < Xi(Q) for all i at time 0. 

Proof of Lemma 14. 3t We prove the result for the neutral process (P^ Y \Y), 
then since X dominates Y, we have the desired result for (P, X) as well. Let 

U= inf min (YAt) - YAO)) 
te[o,i]i=i,...,iv 

and p € be the initial configuration of the population. We only need a 

crude estimate on W [U1 B (n)]. Let V\ be the total number of mutation events 
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(both deleterious and beneficial) and V 2 be the total number of resampling 
events that fall on all individuals during [0,1], then V\ ~ Poisson(Nn) and 
V 2 ~ Poisson(2N). Let 

k w (p) = max{fc - I : p k ^ 0, pi ^ 0} 

be the width of the support of p. Since the resampling mechanism does not 
increase the width of the support, k w (P(t)) < k w (p) + V\ for all t S [0, 1]. The 
most any individual's fitness can decrease due to a resampling event at t is 
k w (P(t)), hence 

-U <V 2 (k w (p) + V 1 ) + (V 2 + l)V 1 , 

where the first term on the right accounts for the possible decrease in fitness 
due to each of the V 2 resampling events and the second term accounts for the 
possible decrease due to mutation events between resampling events. Hence by 
Holder's inequality, for any p 6 A^ N \ 

E p [\U\l Bm ] < k w (p) (W [V 2 2 ]) 1/2 (P(B< N ')) 1/2 

+ (W [(2V 2 + 1) 4 ]) 1/4 (W [Vf]) l/A (f(B (w) )) 1/2 
< C ll {k w (j))N + N 2 )e{N) 1 l 2 . 

Since c 2 (p) > j^(k w (p)/2) 2 for any p € S^ N \ we have the desired result. □ 
Proof of Lemma 14.41 First we observe that the requirement 

M 

el o gl -°^M log2jv ^a 8 ^ 00 ( 18 ) 

implies 

M e -( 1 +") > log 2 N (19) 

for sufficiently large N. Let Y be the neutral process defined in (fT7]) . If p € S^'* 
and P\k ,oo) > M/N, then at least M individuals lie in [Kq,oo). Without any 
loss of generality, we assume individuals 1, . . . , M lie in [Ko, 00). We take the 
initial condition Yi(0) = Xj(0) for alii = 1, . . . , N, then Yi(t) < Xi(t) for all 
t > and i = 1, . . . ,N. Let 

A 2 = {i € {1, . . . , M} : Yi remains untouched by a resampling event 
or a deleterious mutation event during [0, 1]}, 

then A 2 is measurable with respect to the filtration generated by and 
Ag^ during the time period [0, 1] and independent from the filtration gener- 
ated by A^. Furthermore, the same argument used for (| 1 5jl implies that the 
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distribution of j^l dominates Binomial(M , e ( 1+ ^)). For p € we write 



inf fc c (P^W) 
te[ti,i] 



+E P 



inf fc c (p( y )(i)) 

*G[ti,l] 



■(i+m) 



(i+m) 



inf fc c (F (y) (<))l { | A2 | <Me -(i +M)/2} 

t£[ti,l] 



By Lemma I^TTb . if P[K ,oa) > M/N, then 



Aol < M e -(i+M) ) < 



-Me" 



72 



(20) 



(21) 



We first deal with the conditional expectation involving the event {IA2I > 
M_ e -(l+n)y m (|2Q| . We observe that for the process Y and individuals in A 2 , any 
change in their fitness is due to the beneficial mutation mechanism and therefore 
can only increase in time during [0, 1]. The number of beneficial mutations that 
fall on any individual during [0,ti) is distributed Poisson{q^,t{) and since t\ > 
1/2, it dominates Poisson(q/i/2). Furthermore, it depends only on there- 
fore is independent of the set valued random variable Ai . Let if 1 be the number 
of individuals in Ai that have their fitness types increase by at least 21og 1 ~ e M 
during [0,ti]. If K x > log 2 N, then inf te[tl4] k c (P^ Y \t)) > if + 21og 1-e M. 
Lemma 157Tb with A = qfi/2 > implies the following: conditioning on |-i4a|, the 
distribution of ifi dominates Binomial(\Ai\, cm exp(— log 1_0 ' 9e M)) for some 
constant cm, and then Lemma |5. la to obtain 



? p ( inf k c (P {Y) (t)) >K + 2 log 1 " 6 M 
\te[ti,x] 

l^l>fe-(^)) 



\A 2 \ > — e 



-(i+m) 



: '" j [\\ > log 2 N 



> P p if x > 



-log 



)E M 



, , , M 



> 1 — cxp — 



c 2 M 



,-(l+/x) p -21og 



9E M 



> 1 — cxp ( -C(2je 



lo g M-2 1og 1 " 9e M 



0.91ogMN 



(22) 



> 1 - cxp (-C( 2 )e 

where C(2) is a constant and we use ()19[) in the second inequality. By (|18j) . 
m 6 -(i+m) > log 2 N for sufficiently large iV, hence inf te[tlil] k c (P^(t)) > if 
on the event {|A 2 | > 4f e^ 1 "^}. Therefore (22) implies 



E p 



inf UP™®) 
te[ti,i] 



|A 2 | > ^ e -a+/0 



>ifn 



1.9 log 1 " 6 M. 



REFERENCES/REFERENCES 



2-1 



Now we deal with the expectation in ([20|) involving the event {\A->\ < 4re 
which, by (HU, has probability < e -- w « ra(1+M) / 2 if p € andp [Koi0o) > M/iV. 
We observe that for such p, there are more than log 2 TV individuals with fitness 
types > Kq at time 0, therefore k c (p) > Ao. Hence Lemma T4. 31 implies 



inf (t c (pW(())-if )l fe . (1+M) 



>-C, 



(Vc 2 (p)^V 3 



+ A' 



-Me- 2 P+">/i 



if P[K , oo) > M/N. Plugging the above two estimates along with (f2Tj) into (fSOjl 
yields for p with P[k 0:O o) > M/JV, 



E p 



inf U^ (y) to) 

«6[*l,l] 



. M./m. + I. "],,.' ' .Ul?'' ( |A 2 | >ye 



M 



(fcc(P (K) (tl))-^0)l { |^|<Me-a+,)/2}J +^0F P (|^ 2 | < T e-( 1+ ^) 

> ^o-^ I (^ 2 lpW + ^ 2 ) e - Me " <1+f ' )/4 



-(1.9 log 1 " 6 M) 



inf ] 

P es< N ): P[ K ,oo)>M/iv 



U+/-0 



Since X dominates Y, we have the desired result 



+ 1.8 log 1 " 6 M, 



□ 
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